The map shows where the data were collected in 2002 and 2022 within the state of California.
missing_pm25 <-sum(is.na(combined_20022022$DailyMeanPM2.5Concentration))prop_missing_pm25 <- missing_pm25 /nrow(combined_20022022)print("Missing Values in PM2.5:")
[1] "Missing Values in PM2.5:"
print(missing_pm25)
[1] 0
print("Proportion of Missing Values:")
[1] "Proportion of Missing Values:"
print(prop_missing_pm25)
[1] 0
summary_pm25 <-summary(combined_20022022$DailyMeanPM2.5Concentration)print("Summary Statistics for PM2.5:")
[1] "Summary Statistics for PM2.5:"
print(summary_pm25)
Length Class Mode
0 NULL NULL
library(ggplot2)library(dplyr)#on a state levelcombined_20022022<-combined_20022022 %>%rename(PM2.5=`Daily Mean PM2.5 Concentration`)ggplot(combined_20022022, aes(x = year, y = PM2.5)) +geom_line(stat ="summary", fun ="mean") +labs(title ="PM2.5 Concentration in California by Year",x ="Year",y ="PM2.5 Concentration")
summary_state <-aggregate(PM2.5~ year, data = combined_20022022, FUN = mean)print(summary_state)
year PM2.5
1 2002 16.115943
2 2022 8.564708
Based on the line graph, it indicated that the PM2.5 concentration decreased throughout the years between 2002 and 2022. Also, the summary statistics indicated that the PM2.5 concentration in 2022 is 8.564708 and 2002 is 16.115943, which showed a decrease.
#on a county level #boxplotcombined_20022022<-combined_20022022 %>%rename(county=`COUNTY`)ggplot(combined_20022022, aes(x = county, y = PM2.5)) +geom_boxplot() +theme(axis.text.x =element_text(angle =60, hjust =1))
labs(title ="PM2.5 Distribution by County",x ="County",y ="Mean PM2.5 Concentration")
$x
[1] "County"
$y
[1] "Mean PM2.5 Concentration"
$title
[1] "PM2.5 Distribution by County"
attr(,"class")
[1] "labels"
# A tibble: 51 × 4
county mean_PM2.5 median_PM2.5 sd_PM2.5
<chr> <dbl> <dbl> <dbl>
1 Alameda 8.81 7.2 6.21
2 Butte 8.71 6 8.90
3 Calaveras 6.60 5.3 4.71
4 Colusa 8.40 7 6.32
5 Contra Costa 9.95 7.8 8.92
6 Del Norte 4.75 4.05 3.43
7 El Dorado 4.47 3.1 7.21
8 Fresno 12.3 8.4 12.1
9 Glenn 5.34 4.4 4.98
10 Humboldt 7.11 6 4.45
# ℹ 41 more rows
The overall lowest mean PM2.5 is in El Dorado county, which is 4.471330, and the highest mean PM2.5 is in Kern county, which is 15.594534, based on the summary.According to the box plot, the two highest outliers on PM2.5 concentration are in Placer and Siskiyou county, which are around 300.
#for sites in LAlibrary(data.table)library(tidyverse)la_2002 <- data.table::fread("/Users/sabrinayang/Downloads/la_2002.csv")la_2022 <- data.table::fread("/Users/sabrinayang/Downloads/la_2022.csv")#Check dimensionsdim(la_2002)
[1] 2349 20
dim(la_2022)
[1] 6016 20
#Check the first few rows (headers) for each datasethead(la_2002)
Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
1: 01/01/2002 AQS 60370002 1 32.3 ug/m3 LC
2: 01/02/2002 AQS 60370002 1 57.2 ug/m3 LC
3: 01/03/2002 AQS 60370002 1 39.2 ug/m3 LC
4: 01/04/2002 AQS 60370002 1 23.2 ug/m3 LC
5: 01/05/2002 AQS 60370002 1 7.3 ug/m3 LC
6: 01/07/2002 AQS 60370002 1 7.3 ug/m3 LC
DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1: 93 Azusa 1 100
2: 152 Azusa 1 100
3: 110 Azusa 1 100
4: 74 Azusa 1 100
5: 30 Azusa 1 100
6: 30 Azusa 1 100
AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
1: 88101 PM2.5 - Local Conditions 31080
2: 88101 PM2.5 - Local Conditions 31080
3: 88101 PM2.5 - Local Conditions 31080
4: 88101 PM2.5 - Local Conditions 31080
5: 88101 PM2.5 - Local Conditions 31080
6: 88101 PM2.5 - Local Conditions 31080
CBSA_NAME STATE_CODE STATE COUNTY_CODE
1: Los Angeles-Long Beach-Anaheim, CA 6 California 37
2: Los Angeles-Long Beach-Anaheim, CA 6 California 37
3: Los Angeles-Long Beach-Anaheim, CA 6 California 37
4: Los Angeles-Long Beach-Anaheim, CA 6 California 37
5: Los Angeles-Long Beach-Anaheim, CA 6 California 37
6: Los Angeles-Long Beach-Anaheim, CA 6 California 37
COUNTY SITE_LATITUDE SITE_LONGITUDE
1: Los Angeles 34.1365 -117.9239
2: Los Angeles 34.1365 -117.9239
3: Los Angeles 34.1365 -117.9239
4: Los Angeles 34.1365 -117.9239
5: Los Angeles 34.1365 -117.9239
6: Los Angeles 34.1365 -117.9239
head(la_2022)
Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
1: 01/05/2022 AQS 60370002 1 10.7 ug/m3 LC
2: 01/11/2022 AQS 60370002 1 3.1 ug/m3 LC
3: 01/17/2022 AQS 60370002 1 11.9 ug/m3 LC
4: 01/23/2022 AQS 60370002 1 3.5 ug/m3 LC
5: 01/26/2022 AQS 60370002 1 3.4 ug/m3 LC
6: 01/29/2022 AQS 60370002 1 4.3 ug/m3 LC
DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1: 45 Azusa 1 100
2: 13 Azusa 1 100
3: 50 Azusa 1 100
4: 15 Azusa 1 100
5: 14 Azusa 1 100
6: 18 Azusa 1 100
AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
1: 88101 PM2.5 - Local Conditions 31080
2: 88101 PM2.5 - Local Conditions 31080
3: 88101 PM2.5 - Local Conditions 31080
4: 88101 PM2.5 - Local Conditions 31080
5: 88101 PM2.5 - Local Conditions 31080
6: 88101 PM2.5 - Local Conditions 31080
CBSA_NAME STATE_CODE STATE COUNTY_CODE
1: Los Angeles-Long Beach-Anaheim, CA 6 California 37
2: Los Angeles-Long Beach-Anaheim, CA 6 California 37
3: Los Angeles-Long Beach-Anaheim, CA 6 California 37
4: Los Angeles-Long Beach-Anaheim, CA 6 California 37
5: Los Angeles-Long Beach-Anaheim, CA 6 California 37
6: Los Angeles-Long Beach-Anaheim, CA 6 California 37
COUNTY SITE_LATITUDE SITE_LONGITUDE
1: Los Angeles 34.1365 -117.9239
2: Los Angeles 34.1365 -117.9239
3: Los Angeles 34.1365 -117.9239
4: Los Angeles 34.1365 -117.9239
5: Los Angeles 34.1365 -117.9239
6: Los Angeles 34.1365 -117.9239
Based on the summary table, Lebec has the lowest mean PM2.5 (4.439333), while Burbank has the highest mean PM2.5 (23.969672). The boxplot tells the distribution (including the outliers) of each site in LA. The histogram shows the frequency of the PM2.5 concentration at each site.